Courses@HKUST
- EESM5060 Embedded Systems
- ELEC6910A First Principles of CV
Books
- Digital Integrated Circuits, A Design Perspective. Second Edition
- CMOS VLSI Design, A Circuits and Systems Perspective. Fourth Edition
- Verilog Digital System Design. Second Edition
- Computer Architecture, A Quantitive Approach. Sixth Edition
- 动手学深度学习 Release 2.0.0-beta1
- 神经网络加速器的计算架构及存储优化技术研究 (Thanks to Prof. Fengbin TU who is the author and gifts me this book.)
Papers
- AutoDCIM: An Automated Digital CIM Compiler
- Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Networks
- DaDianNao: A Machine-Learning Supercomputer
- Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks
- Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks
- PRIME: A Novel Processing-in-memory Architecture for Neural Network Computation in ReRAM-based Main Memory
- Reconfigurability, Why It Matters in AI Tasks Processing: A Survey of Reconfigurable AI Chips
- Stripes: Bit-Serial Deep Neural Network Computing
- A 4nm 6163-TOPS/W/b 4790-TOPS/mm2/b SRAM Based Digital-Computing-in-Memory Macro Supporting Bit-Width Flexibility and Simultaneous MAC and Weight Update
- An 89TOPS/W and 16.3TOPS/mm2 All-Digital SRAM-Based Full-Precision Compute-In Memory Macro in 22nm for Machine-Learning Edge Applications
- A 5-nm 254-TOPS/W 221-TOPS/mm2 Fully-Digital Computingin-Memory Macro Supporting Wide-Range Dynamic-Voltage-Frequency Scaling and Simultaneous MAC and Write Operations
- A 12nm 121-TOPS/W 41.6-TOPS/mm2 All Digital Full Precision SRAM-based Compute-in-Memory with Configurable Bit-width For AI Edge Applications
- An Ultra-Low-Voltage Bit-Interleaved Synthesizable 13T SRAM Circuit
- All-Digital Time-Domain Compute-in-Memory Engine for Binary Neural Networks With 1.05 POPS/W Energy Efficiency
- Multi-Function CIM Array for Genome Alignment Applications built with Fully Digital Flow
- Compiling All-Digital-Embedded Content Addressable Memories on Chip for Edge Application
- AI SoC Design in Foundation Model era
- Benchmark and Modelling for SRAM based CIM
- A Survey of Accelerator Architecture for DNNs
- DIMC: 2219TOPS/W 2569F2/b Digital In-Memory Computing Macro in 28nm Based on Approximate Arithmetic Hardware
- A 28nm 38-to-102-TOPS/W 8b Multiply-Less Approximate Digital SRAM Compute-In-Memroy Macro for Neural-Network Inference
- Approximate De-randomizer for Stochastic Circuits
- Algorithm-Software-Hardware Co-Design for Deep Learning Acceleration
- Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Networks
- Timeloop: A Systematic Approach to DNN Accelerator Evaluation
- DynaPlasia: An eDRAM In-Memory-Computing-Based Reconfigurable Spatial Accelerator with Triple-Mode Cell for Dynamic Resource Switching
- A 28nm 11.2TOPSW Hardware-Utilization-Aware Neural-Network Accelerator with Dynamic Dataflow
- Understanding Reuse, Performance, and Hardware Cost of DNN Dataflows: A Data-Centric Approach (MAESTRO)
- Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks
- MNSIM 2.0: A Behavior-Level Modeling Tool for Processing-In-Memory Architectures
- Towards Heterogeneous Multi-core Accelerators Exploiting Fine-grained Scheduling of Layer-Fused Deep Neural Networks
- DIANA: An End-to-End Energy-Efficient DIgital and ANAlog Hybrid Neural Network SoC.
- MARS: Multimacro Architecture SRAM CIM-Based Accelerator With Co-Designed Compressed Neural Networks
- Scalable and Programmable Neural Network Inference Accelerator Based on In-Memory Computing
- Fused-Layer CNN Accelerators
- Automatic Generation of Structured Macros Using Standard Cells ‒ Application to CIM